261 research outputs found

    Adjusting Imperfect Data: Overview and Case Studies

    Get PDF
    Research users of large administrative have to adjust their data for quirks, problems, and issues that are inevitable when working with these kinds of datasets. Not all solutions to these problems are identical, and how they differ may affect how the data is to be interpreted. Some elements of the data, such as the unit of observation, remain fundamentally different, and it is important to keep that in mind when comparing data across countries. In this paper (written for Lazear and Shaw, 2007), we focus on the differences in the underlying data for a selection of country datasets. We describe two data elements that remain fundamentally different across countries -- the sampling or data collection methodology, and the basic unit of analysis (establishment or firm) -- and the extent to which they differ. We then proceed to document some of the problems that affect longitudinally linked administrative data in general, and we describe some of the solutions analysts and statistical agencies have implemented, and explore, through a select set of case studies, how each adjustment or absence thereof might affect the data.

    Adjusting Imperfect Data: Overview and Case Studies

    Get PDF
    [Excerpt] In this chapter, instead of using the similarity in the cleaned datasets to investigate economic fundamentals, we focus on the differences in the underlying ‘dirty’ data. We describe two data elements that remain fundamentally different across countries, and the extent to which they differ. We then proceed to document some of the problems that affect longitudinally linked administrative data in general, and we describe some of the solutions analysts and statistical agencies have implemented, and some that they did not implement. In each case, we explain the reasons for and against implementing a particular adjustment, and explore, through a select set of case studies, how each adjustment or absence thereof might affect the data. By giving the reader a look behind the scenes, we intend to strengthen the reader’s understanding of the data. Thus equipped, the reader can form his or her own opinion as to the degree of comparability of the findings across the different countries

    Replicating the Synthetic LBD with German Establishment Data

    Get PDF
    One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so intense that many statistical agencies cannot afford them. However, we argue in this paper that the field is still evolving and many lessons that have been learned in the early years of synthetic data generation can now be used in the development of new synthetic data products, considerably reducing the required investments. We evaluate whether synthetic data algorithms that have been developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with information comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a second stage, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal

    Looking back on three years of Synthetic LBD Beta

    Get PDF
    Distributions of business data are typically much more skewed than those for household or individual data and public knowledge of the underlying units is greater. As a results, national statistical offices (NSOs) rarely release establishment or firm-level business microdata due to the risk to respondent confidentiality. One potential approach for overcoming these risks is to release synthetic data where the establishment data are simulated from statistical models designed to mimic the distributions of the real underlying microdata. The US Census Bureau\u27s Center for Economic Studies in collaboration with Duke University, the National Institute of Statistical Sciences, and Cornell University made available a synthetic public use file for the Longitudinal Business Database (LBD) comprising more than 20 million records for all business establishment with paid employees dating back to 1976. The resulting product, dubbed the SynLBD, was released in 2010 and is the first-ever comprehensive business microdata set publicly released in the United States including data on establishments employment and payroll, birth and death years, and industrial classification. This paper documents the scope of projects that have requested and used the SynLBD

    Science, Confidentiality, and the Public Interest

    Get PDF
    We describe the benefits of providing data to public agencies, and how public agencies navigate the narrow path between too much information disclosure on one hand, and the release of useful information on the other hand

    metajelo: A Metadata Package for Journals to Support External Linked Objects

    Get PDF
    We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque

    Presentation: Did the Housing Price Bubble Clobber Local Labor Market Job and Worker Flows When It Burst?

    Get PDF
    We integrate local labor market data on worker flows, job flows, employment levels, and earnings with MSA-level data on housing prices and local area unemployment, to study the local labor market dynamics associated with the U.S. housing price bubble of the late 2000s. We proceed to study the magnitude and timing of the relation between the changes in local housing prices and local worker and job flows, and local labor market earnings.In addition to the unique contribution of using both local labor and housing market data, the paper also considers the contributions of the aggregate movements in the worker and job flows to the heterogeneous local labor market outcomes

    Displaced workers, early leavers, and re-employment wages

    Get PDF
    When receiving information about an imminent plant closure or mass layoffs, workers search for new jobs. This has been the premise of advance notice legislation, but has been difficult to verify using survey data. In this paper, we lay out a search model that takes explicitly into account the information flow prior to a mass layoff. Using universal wage data files that allow us to identify individuals working with healthy and displacing firms both at the time of displacement as well as any other time period, we test the predictions of the model on re-employment wages. Controlling for worker quality and unobservable firm characteristics, workers leaving a "distressed" firm have higher re-employment wages than workers who stay with the distressed firm until displacement.Displaced workers, search theory, advance notice, linked firm-worker data sets

    Testimony of John M. Abowd Before the House Committee on Energy and Commerce, Subcommitte on Commerce, Manufacturing and Trade, United States House of Representatives

    Get PDF
    We focus attention on gross flows in the labor market and their role in economic reallocation. Economists distinguish between movements of individuals (gross worker flows) and those associated with businesses (gross job flows). The gross worker flows are accessions (hiring and recalls) and separations (quits, layoffs, retirements, and firings). The gross job flows are creations (increases in the employment of a given business establishment) and destructions (decreases in employment of a given business establishments). In our testimony, we discuss the different flows and the regional variation therein over the last recession
    corecore